Clustering View-Segmented Documents via Tensor Modeling
نویسندگان
چکیده
We propose a clustering framework for view-segmented documents, i.e., relatively long documents made up of smaller fragments that can be provided according to a target set of views or aspects. The framework is designed to exploit a view-based document segmentation into a third-order tensor model, whose decomposition result would enable any standard document clustering algorithm to better reflect the multifaceted nature of the documents. Experimental results on document collections featuring paragraph-based, metadata-based, or user-driven views have shown the significance of the proposed approach, highlighting performance improvement in the document clustering task.
منابع مشابه
Semantic-Based Multilingual Document Clustering via Tensor Modeling
A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a ne...
متن کاملMulti-View Subspace Clustering via Relaxed L1-Norm of Tensor Multi-Rank
In this paper, we address the multi-view subspace clustering problem. Our method utilize the circulant algebra for tensor, which is constructed by stacking the subspace representation matrices of different views and then shifting, to explore the high order correlations underlying multi-view data. By introducing a recently proposed tensor factorization, namely tensor-Singular Value Decomposition...
متن کاملMulti-Camera Visual Surveillance for Motion Detection, Occlusion Handling, Tracking and Event Recognition1
This paper presents novel approaches for background modeling, occlusion handling and event recognition by using multi-camera configurations that can be used to overcome the limitations of the single camera configurations. The main novelty in proposed background modeling approach is building multivariate Gaussians background model for each pixel of the reference camera by utilizing homography-re...
متن کاملUsing Clustering Techniques for on-segmented Language Document Management: A Comparison of K-mean and Self Organizing Map Techniques
Since the number of electronics non-segmented language documents is growing very fast, efficient document clustering techniques for non-segmented languages are needed as a tool in today’s world where a lot of documents are stored and retrieved electronically. It enables one to group the similar documents using keywords or terms of the clusters. Thus document clustering can be used to group and ...
متن کاملRobust Kernelized Multi-View Self-Representations for Clustering by Tensor Multi-Rank Minimization
Most recently, tensor-SVD is implemented on multi-view self-representation clustering and has achieved the promising results in many real-world applications such as face clustering, scene clustering and generic object clustering. However, tensor-SVD based multi-view self-representation clustering is proposed originally to solve the clustering problem in the multiple linear subspaces, leading to...
متن کامل